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Abstract — A new negative result for nonparametric distri- 
bution estimation of binary ergodic processes is shown. The 
problem of estimation of distribution with any degree of accuracy 
is studied. Then it is shown that for any countable class of 
estimators there is a zero-entropy binary ergodic process that is 
inconsistent with the class of estimators. Our result is different 
from other negative results for universal forecasting scheme of 
ergodic processes. We also introduce a related result by B. Weiss. 

Index Terms — ergodic process, cutting and stacking, nonpara- 
metric estimation, computable function. 

I. Introduction 

Let Xl, X2, . . . be a binary-valued ergodic process and 
P be its distribution. In this paper we study nonparametric 
estimation of binary-valued ergodic processes with any degree 
of accuracy. Let S and be the set of finite binary strings 
and the set of infinite binary sequences, respectively. Let 
A(x) := {xw|u; G ft}, where xw is the concatenation of 
x G S and w, and write P{x) = P(A(x)). For x G S, \x\ 
is the length of x. Let N, Z, and Q be the set of natural 
numbers, the set of integers, and the set of rational numbers, 
respectively. From ergodic theorem, there is a function r such 
that for x G S, n, k G N, 

y | — | x | + 1 



P(U{A(y)\\P(x)-±- £ J« + ,.,- 1= J>l/fc, 
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\y\ = n}) < r(n,k,x), 
Vx, k lim r(n, k, x) = 0, 



(1) 



where / is the indicator function and y\ = ytiji+i ■ ■ - yj for 
V = Vi ■■■yn,i < j < n. r is called convergence rate. If 
r is given, we know how much sample size is necessary to 
estimate the distribution with prescribed accuracy. However 
it is known that there is no universal convergence rate for 
ergodic theorem. If r is not known, ergodic theorem does 
not help to estimate the distribution with prescribed accuracy. 
Here a natural question arise: for any binary-valued ergodic 
process, is it always possible to estimate the distribution with 
any degree of accuracy with positive probability? We show that 
this problem has a negative answer, i.e., for any countable class 
of estimators there is a zero-entropy binary ergodic process 
that is not estimated from this class of estimators with positive 
probability. In particular, since the set of computable functions 
is countable, we see that there is a zero-entropy binary ergodic 



process that is inconsistent with computable estimators. Our 
result is not derived from other negative results for universal 
forecasting scheme of ergodic processes, see Remark [2] 
Let x C y if x is a prefix of y. f is called estimator if 



f(x, k, y) G Q is defined for (x, k, y) G S x N x S 
Vz □ y f{x, k, z) = f(x, k, y). 



(2) 



For we!!, let f(x, k, u>) := f(x, k, y) if f(x, k, y) is defined 
and jCw. We say that / estimates P if 

P(lu | Vx, k f(x,k,u>) is defined and 

\P(x)-f(x,k,u)\<^)>0. 

Here ui is a sample sequence and the minimum length of y C lu 
for which /(x, k, y) is defined is a stopping time. 

In this paper, we construct an ergodic process that is not 
estimated from any given countable set of estimators: 

Theorem 1. 

V.F : countable set of estimators 

3P ergodic and zero entropy V/ G F 

P(uj | Vx, k /(x, k, uj) is defined and 

\P(x)-f(x,k,uj)\ < i) = o. 

We say that P is effectively estimated if there is a partial 
computable / that satisfies (O and ([3]). Since the set of partial 
computable estimators is countable, we have 

Corollary 1. There is a zero entropy ergodic process that is 
not effectively estimated. 

If r in (Q]i is computable then it is easy to see that P is 
effectively estimated. For example, i.i.d. processes of finite 
alphabet are effectively estimated, see Leeuw et al. (3). 

As stated above, a difficulty of effective estimation of 
ergodic processes comes from that there is no universal 
convergence rate for ergodic theorem. In Shields pp.171 [0, 
it is shown that for any given decreasing function r, there is 
an ergodic process that satisfies 

n 

3NVn > N P(\P(1) - ^I Xt =i/n\ > 1/2) > r(n). (4) 

i=l 

In particular if r is chosen such that r decreases to asymp- 
totically slower than any computable function then r is not 



computable. In V'yugin J9], a binary- valued computable sta- 
tionary process with incomputable convergence rate is shown. 

It is possible that an ergodic process is effectively estimated 
even if the convergence rate is not computable. 

Theorem 2. For any decreasing r, there is a zero entropy 
ergodic process that is effectively estimated and satisfies (0. 

For proofs of Thereom [1] and |2] see |]8] . 

Remark 1. (i) P is computable => (ii) convergence rate r in 
(Q~|l is upper semi-computable (effectively approximated from 
above) => (iii) P is effectively estimated. None of the converse 
is true. 

Remark 2. In Cover |2), two problems about prediction of 
ergodic processes are posed. Problem 1 : Is there a universal 
scheme / such that lim„^oo IfiX^ 1 ) - ^PM^o" 1 ) 
0, a.s. for all binary-valued ergodic PI Problem 2 : Is 
there a universal scheme / such that lirn„_ ! . 00 /(X" 1 ) — 
P(X \Xzl o )\ -> 0, a.s. for all binary-valued ergodic PI 
Problem 2 was affirmatively solved by Ornstein [5|, [10|. 
Problem 1 has a negative answer as follows (Bailey, Ryabko, 
see m, 0, ||4|): For any / there is a binary-valued ergodic 
process X\ , X2 , ■ ■ ■ such that 

P(limsup IfiX^ 1 ) - PiX^X^l > 0) > 0. (5) 

n—¥oo 

It is not difficult to see that the above result is extended to 
a countable class {f%, f2, ■ ■ .}, i.e., for any {/1, f2, ■ . ■} there 
is an ergodic process such that (0 holds for all /i,/2, ■ • ■■ 
However this result does not imply Theorem Q] In fact, there is 
a finite-valued ergodic process that is effectively estimated but 
satisfies (0. Roughly speaking, one of the difference between 
these problems is that in Problem 1 we have to estimate 
P{X n \X 7 ^ 1 ) from X£ , however in our estimation scheme, 
sample size is a stopping time and we can use a sufficiently 
large sample X™, m > n to estimate P(Xq). 

Remark 3 (B. Weiss). We say that / : S — > [0, 1] is weakly 
universally consistent if V binary ergodic PVe>03A< r e Vn> 
N e 

P(\P(1) - f(X?)\ < e) >l-e. 

For example, (Xi + • • • + X n )/n is weakly universally con- 
sistent. Then for any weakly universally consistent / and for 
any increasing n\ < < • • • and Vi > there is a binary 
ergodic P and increasing sequence N\ < N% < ■ ■ ■ such that 
Vi n t < Ni, P(l) = i, and 

Vk>5P(\\-f(X?*)\<±)<e k . 

In the above, it is easy to extend the sample size to a 
stopping time and / to a countable class of weakly uni- 
versally consistent functions, i.e., for any countable class of 
weakly universally consistent functions {/i,/2,---} and for 
any e; > 0, i = 1, 2, . . ., there is a binary ergodic P such that 
P(l) = i, and 

Vi3KVk>KP{u\\^-fi{u)\<^)<e k . (6) 



The difference between this result and Theorem Q] is that ((6) 
requires the universality of / but is much stronger statement 
than the fact that P is not estimated in the sense of TheoremQ] 
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